mean action
Multi Type Mean Field Reinforcement Learning
Subramanian, Sriram Ganapathi, Poupart, Pascal, Taylor, Matthew E., Hegde, Nidhi
Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field games, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field games: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- (4 more...)
TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow
Hafner, Danijar, Davidson, James, Vanhoucke, Vincent
We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.
Dynamic Goal Recognition Using Windowed Action Sequences
Menager, David (University of Kansas) | Choi, Dongkyu (University of Kansas) | Floyd, Michael W. (Knexus Research Corporation) | Task, Christine (Knexus Research Corporation) | Aha, David W. (Naval Research Laboratory)
In goal recognition, the basic problem domain consists of the following: Recent advances in robotics and artificial intelligence have brought a variety of assistive robots designed to help humans - a set E of environment fluents; accomplish their goals. However, many have limited autonomy and lack the ability to seamlessly integrate with - a state S that is a value assignment to those fluents; human teams. One capability that can facilitate such humanrobot - a set A of actions that describe potential transitions between teaming is the robot's ability to recognize its teammates' states (with preconditions and effects defined over goals, and react appropriately. This function permits E, and parameterized over a set of environment objects the robot to actively assist the team and avoid performing O); and redundant or counterproductive actions.
- North America > United States > Kansas > Douglas County > Lawrence (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling > Plan Recognition (0.47)